CMBfit: Rapid WMAP likelihood calculations with normal parameters 
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We present a method for ultra-fast confrontation of the WMAP cosmic microwave background 
observations with theoretical models, implemented as a publicly available software package called 
CMBfit, useful for anyone wishing to measure cosmological parameters by combining WMAP with 
other observations. The method takes advantage of the underlying physics by transforming into a set 
of parameters where the WMAP likelihood surface is accurately fit by the exponential of a quartic 
or sextic polynomial. Building on previous physics based approximations by Hu et al, Kosowsky 
et al. and Chu et al., it combines their speed with precision cosmology grade accuracy. A Fortran 
code for computing the WMAP likelihood for a given set of parameters is provided, pre-calibrated 
against CMBfast, accurate to Aln£ ~ 0.05 over the entire 2a region of the parameter space for 6 
parameter "vanilla" ACDM models. We also provide 7-parameter fits including spatial curvature, 
gravitational waves and a running spectral index. 



I. INTRODUCTION 



The impressive sensitivity of the long awaited Wilkin- 
son Microwave Anisotropy Probe (WMAP) data allows 
for unprecedented constraints on cosmological models 
[1-6]. The measurements have strengthened the case for 
the cosmological concordance model [6], the inflationary 
ACDM model, a flat Universe which is currently accel- 
erating due to mysterious dark energy, where most mat- 
ter is in the form of collisionless cold dark matter, and 
where the initial conditions for the fluctuations are adi- 
abatic. Because of well known cosmic microwave back- 
ground (CMB) parameter degeneracies [18,5,11], the true 
impressiveness of the data is most clearly demonstrated 
by either imposing reasonable priors, combining the data 
with complimentary data sets, or both. The most re- 
cent precision data-sets are the Sloan Digital Sky Survey 
(SDSS) galaxy power spectrum (SDSS) [7] and a new Su- 
pernovae la compilation [10], and combining these with 
the WMAP constraints have further narrowed error bars 
[8] , giving us cosmological parameters at a precision not 
thought possible only few years ago. 

So how are CMB data-sets used to estimate parame- 
ters? Although constraints on cosmological parameters 
from the CMB can in principle be extracted directly from 
the maps themselves, this is effectively prohibited by the 
huge computational power necessary to perform the re- 
quired likelihood calculations (although for a novel ap- 
proach, see [15]). Instead, the more efficient parameter 
estimation method commonly employed uses the angu- 
lar power spectrum of the map as an intermediate step 
[12-14]. 

Likelihoods on the basis of power spectrum constraints 
are much faster to calculate, the slowest step being the 
computation of the angular power spectrum numerically. 
This can be done either through integration of the full 
Boltzmann equation (with CMBfast [22] or the modifi- 
cations CAMB [24] or CMBEASY [28]) or using an ap- 
proximate shortcut such as DASh [27]. Although steady 
improvements in both computer power and algorithm 



performance have made these calculations significantly 
faster, the CMB power spectrum likelihood calculation 
is still the bottleneck in any parameter estimation pro- 
cess by many orders of magnitude. 

Another problem with the estimation process are the 
near-degeneracies between some cosmological parame- 
ters. Elongated, banana shaped contours on the likeli- 
hood surface make search algorithms less efficient. Sev- 
eral authors have advocated various transformations 
from cosmological parameters to "physical" parameters 
more directly linked to features in the power spectrum 
[18,20]. Chu et al. [21] advocated a new such set of "nor- 
mal" parameters whose probability distributions are well 
fitted by Gaussian distributions. 

Our key idea explored in this paper is that if the like- 
lihood function is roughly a multivariate Gaussian in the 
transformed parameters, then it should be very accu- 
rately approximated by the exponential of a convenient 
higher-order polynomial. A Gaussian likelihood surface 
C corresponds to the log-likelihood ln£ being quadratic, 
and a quadratic Taylor expansion is of course an accu- 
rate approximation of any function near its maximum. 
Very far from the maximum, the Gaussian approximation 
again becomes accurate, since both it and the true like- 
lihood L are vanishingly small. We will see that in most 
cases, small cubic and quartic corrections to In/} help 
provide a near-perfect fit to C everywhere. The WMAP 
team employed such quartic fits to determine reasonable 
step sizes for their Markov Chain Monte Carlo search al- 
gorithm [6] . Here we go further and show how polynomial 
fits can conveniently store essentially all the cosmological 
information inherent in the WMAP power spectra. This 
reduces calculation of CMB likelihoods to simply eval- 
uating the exponential of an n th order polynomial. Al- 
though CMBfast still needs to be run in order to obtain 
a sufficient sampling of the likelihood surface, this only 
has to be done once for each model space and dataset. 
For the models explored in this paper, it has already 
been done and the polynomial coefficients have been ob- 
tained. The polynomial fit can thus be used to run Monte 
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Carlo Markov Chains (MCMC) including various non- 
CMB data-sets many orders of magnitude faster, dramat- 
ically reducing the need for computing power for cosmo- 
logical parameter estimation purposes involving WMAP 
data. This is an improvement on importance sampling 
methods which notoriously depletes the chain of points 
as the added data sets shift or narrow the confidence re- 
gions. It means joint likelihood analyses between WMAP 
and other surveys which would previously take weeks or 
months of computer time on a good workstation being 
finished in an afternoon. 

As mentioned, the last few years have seen the rise 
of a cosmological concordance model [25,5,8], the ACDM 
flat inflationary cosmology which has been confirmed and 
strengthened by WMAP, SDSS and new supernova ob- 
servations. However as cosmological ideas and trends 
change, we choose to keep an open mind and allow for 
extensions to the 6-parameter inflationary ACDM model 
by including models with spatial curvature, gravitational 
waves and a running scalar spectral index. To allow the 
user to include CMB polarization information separately, 
we also perform separate likelihood fits excluding and in- 
cluding WMAP polarization data for the 6 parameter 
scenario. 



II. METHOD 

Our approach in this paper consists of three steps: 

1. Acquire a sample of the likelihood surface as a 
function of cosmological parameters. This is done 
through a Markov Chain Monte Carlo sampling of 
parameter space. 

2. Transform from cosmological parameter space into 
normal parameters. This will make the likeli- 
hood surface close to Gaussian in these parameters, 
thereby increasing significantly the accuracy of the 
polynomial fit. 

3. Fit of the log-likclihood surface to an n th order 
polynomial. The polynomial degree n is optimized 
to the the likelihood-surface sampling density using 
a training set/test set approach. 

Below we describe each of the above steps in detail. 

A. The Cosmological Parameters 

We follow standard work [5,25] in the field and param- 
eterize our cosmological model with the 13 parameters 

P = {r,Cl k ,Cl A ,w,u) d ,u) b ,f v , A s ,n s ,a,r,n t ,b) (1) 

These parameters are the reionization optical depth, t, 
curvature energy density flk, the dark energy density Oa, 
the dark energy equation of state w, the physical dark 



matter and baryon densities ujd = fldh 2 and u>b = f^/i 2 , 
the fraction of dark matter that is warm (massive neu- 
trinos) f v , the primordial amplitudes and tilts of the 
scalar and tensor fluctuations respectively A s , n s , A t , n t 
and the running of the scalar tilt a. Here A s ,n s and a 
are defined by the Ansatz P*(k) = A s (k/k eval ) n °+ aink , 
and similarly for the tensor case, b is the galaxy bias 
b 2 = Pgaiaxy{k)/ P{k). For a comprehensive cosmological 
parameter summary, see Table 1 of [8]. 

As stated in the introduction, we base our work 
around the adiabatic ACDM cosmological model, 
{r, ^d, ^b, n s , A s }, a 6-parameter subspace of the 13 
parameters. This is close to the minimal number of free 
parameters needed explain the data (the one exception 
being n s , which is still consistent with unity) and assumes 
a pure cosmological constant and negligible spatial curva- 
ture, tensor fluctuations, running tilt, warm dark matter 
or hot dark matter As [5,8], we also consider models with 
added "spice" such as curvature, tensor contributions and 
running tilt. We confine ourselves to a maximum of 7 pa- 
rameters per model, i.e., to models with ACDM + a 7 th 
free parameter. 

B. The Likelihood 

The fundamental quantity that one wants to estimate 
is the probability distribution function (PDF) of the 
parameters vector p, P(p|d), given the data, d, and 
whatever prior assumptions and knowledge we may have 
about the parameters. The quantity we directly evalu- 
ate, however, is the probability of measuring the data 
given the parameters, P(d|p) through a goodness-of-fit 
test. It is this distribution, when thought of as a func- 
tion of the parameters, that we refer to as the likelihood, 
£(p) = P(d|p). The probability distribution function for 
the parameters is then related to the likelihood through 
Bayes' theorem: 

P(p|d) cx P(d|p)P p „ or (p) (2) 

where P pr ior is the prior probability distribution of the 
parameters. 

For ideal, full-sky, noiseless experiments, exact likeli- 
hood calculation is simple and fast. However, due to fore- 
ground contamination (the Galaxy, point sources etc.), 
only a fraction of the sky can be used for analysis. This 
leads to correlations between different multipoles and it 
becomes computationally prohibitive to calculate the ex- 
act likelihood function. Consequently, various approxi- 
mations exist on which much work has been focused [6,16] 
(for an excellent review of CMB likelihood calculations 
see [17]). In all our WMAP likelihood calculations we 
employ the latest version of the likelihood approximation 
routines supplied by the WMAP team. These routines 
take all effects into account, use an optimal combina- 
tion of the various approximations, and are well tested 
through simulations [6]. As input, they take the CMB 
power spectrum, which we compute with CMB fast. 
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C. Fitting to an n order polynomial 

Let us now look at how the polynomial fit is performed 
in detail. We start with a sample of N points pi, ...,pjv 
in the cZ-dimensional parameter space where the likeli- 
hood £(p) has been evaluated, and wish to fit ln£(p) to 
a polynomial. Let (p) denote the average of the parame- 
ter vectors Pi in our sample and let C = (pp*) — (p)(p)' 
denote the corresponding covariance matrix. To improve 
the numerical stability of our fitting, we work with trans- 
formed parameters that have zero mean and unit covari- 
ance matrix 1 : 

z = E(p-<p», (3) 

where E is a matrix satisfying ECE* = I so that (zz*) = 
E ((P- (P))(P- (p))*) e * = I) the identity matrix. There 
are many such choices of E — we make the choice where 
the rows of E are the eigenvectors the parameter covari- 
ance matrix C divided by the corresponding eigenval- 
ues, so our transformed parameters z can be interpreted 
as simply uncorrelated eigenparameters rescaled so that 
they all vary on the same scale (with unit variance). 
This transformation turns out to be crucial, changing 
the matrix-inversion below from quite ill-conditioned to 
numerically well-conditioned. 

A n th order polynomial in these d transformed param- 
eters has M = ( n+d ) = terms: 

\ n ) n\d\ 

y = \ogC = q a +Y^ <liZi + Q l 2 ll2z h z i2 + 

i ii<i2 

+ ]T q^-^z il z i2 ---z id (4) 

ii<i 2 <-<id 

We assemble all necessary products of Zi's into an M- 
dimcnsional vector 

x = {1, zi, ■ ■ ■ , Zd, zizi, • • • , IU=i,nZi}, (5) 

and the corresponding coefficients q into another M- 
dimensional vector 

q = {go,«iV--,g?,<ZiV--,^--V--} ; (6) 

which simplifies eq. (4) to 

y = x • q. (7) 

We now assemble the N measured log-likelihoods j/j from 
our Monte Carlo Markov Chain into an iV-dimensional 
vector y and the corresponding x- vectors into an N x M- 
dimensional matrix X, so the iV-dimensional vector of 
residual errors e from our polynominal fit is 

e = y - Xq. (8) 



We choose the fit that minimizes the rms residual, i.e., 
that minimizes |e| 2 . Differentiating with respect to q 
gives the standard least-squares result 

q=(X t X)- 1 X* y . (9) 

Thus the minimizing the sum of squares in the end comes 
down to the inversion of an M x M matrix. The size of 
the matrix, M — (n + d)\/n\d\, depends on both number 
of parameters and the polynomial degree, ranging from 
M = 210 for a 6 parameter 4th order fit to M = 1716 for 
a 7 parameter 6th order fit (see table I for the number of 
coefficients for various relevant cases). 

Chu et.al. [21] have illuminated the problems of fitting 
a Gaussian directly to the 6- (or higher) dimensional like- 
lihood surfaces and have argued that the surfaces may be 
too sparsely sampled in these dimensions. Consequently 
[21] fits to the 2D marginalized distributions and recon- 
structs the 6D likelihood function from this. Our inter- 
pretation is that the difficulties with fitting a quadratic 
polynomial to the 6 or 7 dimensional log-likelihood sur- 
face shows that the likelihood surface deviates too much 
from a Gaussian and that a higher order polynomial is 
required to reproduce the likelihood to sufficient accu- 
racy. This interpretation is shared by [6] who use a 4 th 
order polynomial to calculate MCMC step sizes. 

It is an advantage of our approach that we do not rely 
strictly on the chain we fit to being a fair statistical sam- 
ple of the likelihood. Indeed we only need the value of 
the likelihood at a sufficient number of points, and we are 
as such insensitive to statistical errors such as sampling 
errors and poor mixing. The way that the input points 
Pi sample parameter space tells the fitting algorithm how 
important we consider residuals in various places. Since 
our points are distributed as the WMAP likelihood it- 
self, the fit will be accurate in those parts of parameter 
space that are consistent with the data. If our fits are 
combined with complementary data sets, high accuracy 
is of course only necessary in the small jointly allowed 
region of parameter space, and this accuracy can option- 
ally be further improved by including non-CMB data to 
determine how to sample the CMB likelihood surface. 

Clearly the sample size (the number of steps in the in- 
put Monte Carlo Markov Chain) along with the dimen- 
sionality of the parameter space determines how densely 
the likelihood surface is sampled. In order to make a 
best possible fit for the 7 parameter models we there- 
fore include the points from the 6 parameter chain in the 
fit, thus placing extra statistical weight on the vanilla 
parameter substance. This allows us to use the higher 
parameter fit to get excellent results for the 6 parameter 
case as well in addition to reducing polynomial artifacts. 



1 The advantage of diagonalising the parameter covariance matrix was also pointed out by [21] 
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Parameters 


n = 2 


n = 3 


n = 4 


n — 5 


n — 6 


5 


21 


56 


126 


252 


462 


6 


28 


84 


210 


462 


924 


7 


36 


120 


330 


792 


1716 


8 


45 


165 


495 


1287 


3003 


9 


55 


220 


715 


2002 


5005 


10 


66 


286 


1001 


3003 


8008 


11 


78 


364 


1365 


4368 


12376 



TABLE I. Number of coefficients for a model with d parameters fitting to an n th order polynomial. The number of coefficients 
range from 28 for a 6 parameter 2 nd order fit to a whopping 12376 for an 11 dimensional model and 6 th order polynomial. 



Of course the polynomial has complete freedom out- 
side the sampled region, which means that for degree 
n > 2 the fit will generally blow up in regions far from 
the origin. This means that once a search algorithm ven- 
tures outside the allowed region, it may find unphysical 
areas of huge likelihoods, much higher than the real max- 
imum. We find that this artifact is efficiently eliminated 
by replacing the polynomial fit by a Gaussian y = e~ r I 2 
outside some large radius r = |z| = {z\ + • • • + zf) 1 / 2 in 
the transformed parameter space. 

To ensure that we do not introduce significant polyno- 
mial artifacts within the sampled region, we use a stan- 
dard training set/test set approach. We run the fit on, 
e.g., 70% of the chain and test the fit on the remainder 
of the chain. As the polynomial degree is increased, the 
training errors will inevitably get smaller since there are 
more degrees of freedom, while the polynomial eventually 
develops unphysical small-scale wiggles in between sam- 
ple points. This problem is quantified by measuring the 
errors in the test set, allowing us to identify the optimal 
polynomial degree as the point where the test set error 
is minimized. In the limit of very large sample size N, 
the test and training errors approach the same value for 
any given polynomial degree n. 

D. Markov Chain Monte Carlo 

To make an accurate polynomial fit, we need a suffi- 
ciently large sample of the likelihood surface as a function 
of the cosmological parameters. This is done by Markov 
Chain Monte Carlo (MCMC) sampling of the parameter 
space through the use of the Metropolis-Hastings algo- 
rithm [31-40]. When implemented correctly, this is a very 
effective method for explorations of parameter space, and 
we briefly review the concept here. What we want is to 
generate samples Pi, i —0, 1, 2, ... from the probability 
distribution P(p) of eq. (2). The method consists of the 
following steps: 

1 . We start by choosing a starting point po in param- 
eter space, and evaluate the corresponding value of 
the probability distribution P(po). 

2. Next we draw a candidate point Pi+i from a pro- 
posal density Q(Pi+i|Pi) and calculate the ratio 



Q(p»|p»+i) p (p»+i) 
Q(pi+i|p») P(Pi) 



3. If a > 1 we accept this new point, add the new 
point to the chain, and repeat the process start- 
ing from the new point. If a < 1, we accept the 
new state with probability a, otherwise we reject 
it, write the current state to the chain again and 
make another draw from the proposal density Q. 

After an initial burn-in period which depends heavily on 
the initial position in parameter space (the length of this 
burn-in can be as short as 100 steps whilst some chains 
still have not converged after several thousand steps), 
the chain starts sampling randomly from the distribu- 
tion P(x), allowing for calculation of all relevant statis- 
tics such as means and variances of the parameters. The 
choice of proposal density is of great importance for the 
algorithm performance as it defines a characteristic step 
size in parameter space. Too small a value and the chain 
will exhibit poor mixing, an excessive step size and the 
chain will converge very slowly since almost all candidate 
points get rejected. The acceptance ratio is a common 
measure of how successful a chain is. However, whereas 
a low acceptance ratio certainly demonstrates poor per- 
formance, high acceptance ratio can be an artifact of 
too small step size, which makes successive points in the 
chain highly correlated. As discussed in [8], a better fig- 
ure of merit is the chain correlation length, as it deter- 
mines the effective number of independent points in the 
chain. 

Our particular MCMC implementation is described in 
Appendix A of [8], and gradually optimizes the proposal 
density Q(p'|p) using the data itself. Once this learn- 
ing phase is complete (typically after about 2000 steps, 
which are then discarded), our proposal density Q(p'|p) 
is a multivariate Gaussian in the jump vector p' p 
with the same covariance matrix as the sample points 
Pi themselves. This guarantees optimum performance 
of the Metropolis algorithm by minimizing the number 
of jumps outside high confidence regions, whilst still en- 
suring good mixing. A very similar eigenbasis approach 
to jumping has been successfully used in other recent 
MCMC codes, notably [21,32,41]'. 
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For each case described in the next section we generate 
several chains, with different initial conditions, which we 
pass through a number of convergence and mixing tests 
given in [6] and [31]. 



E. CMB observables and normal parameters 



The issue of what we can actually measure from the 
CMB is of fundamental importance, since it helps to clar- 
ify which constraints come directly from the CMB and 
which constraints can only be found by combining CMB 
with other cosmological probes. This has been studied 
in detail by Hu et al. [18], who suggested that much of 
the then available information in the temperature power 
spectrum could in fact be compressed into only four ob- 
servables, the overall horizontal position of the first peak 
plus 3 peak height ratios. This work was re-visited by 
the WMAP team [4] and others [29,30] have similarly 
studied the effects of the parameters, including dark en- 
ergy, on CMB peak locations and spacings. From such 
studies one can understand the degeneracies between cos- 
mological parameters, by studying their effect on these 
quantities. 2 

Kosowsky et al. [20] go further along these lines and 
propose a set of "physical" parameters to which the 
power spectrum Cgs have an approximately linear re- 
sponse. This allows for fast calculation of power spectra 
around the fiducial model. The approach was taken fur- 
ther by [21] in realising that a linear response to these pa- 
rameters in the C/s should result in the logarithm of the 
likelihood function being well represented by a 2nd order 
polynomial, i.e., the likelihood function should be close 
to Gaussian in these parameters. This approach resulted 
in another, similar set of parameters dubbed "normal", 
since they had an approximately normal distribution. 

In this work, we use a best of all worlds approach 
and employ a core set of normal parameters which are 
a combination of the choices available in the above men- 
tioned literature, with some improvements. We use a 
core set of 6 parameters corresponding to the flat ACDM 
model. Specifically, {r, ^a, Qdh 2 , Qbh 2 , n s , A s } cast into 
{e~ 2r , Qf , hs, fi2,t, A p }. These new parameters are the 
physical damping due to the optical depth, an analytic fit 
to the angle subtended by the acoustic scale, the lst-to- 
3rd peak ratio, the lst-to-2nd peak ratio with tilt depen- 
dence removed, the physical effect of n s (e.g., tilt), and 
the fluctuation amplitude at the WMAP pivot point. We 
will now go through them in detail one by one. 



1. The Acoustic Scale Parameter, Q s 

The comoving angular diameter distance at decoupling 
is given by 



DA(adec) 



-f 

H J a, 



dx 



(ii) 



where we have ignored radiation density since the mo- 
ment of interest, decoupling, is well within matter dom- 
ination. Note that the integrand is a function of only 
three parameters, f^, fl^e & n d w. If we assume that the 
scale factor at decoupling is constant, the integral is also 
dependent upon only these three parameters. 

Although a numerical evaluation of the above integral 
is trivial, it is not as fast as we would wish and would 
quickly become the dominant obstacle in the polynomial 
likelihood calculation. There are several reasonably good 
analytic fitting formulae out there [18]. However none of 
them are accurate enough for our needs, so we also per- 
form a polynomial fit for Da- We rewrite the expression 
for the angular diameter distance as 



D A (a) « D A (a) E = 



xd{n k ,n A , w ) (12) 



where d is an analytic approximation to the integral and 
equals 1 for a matter dominated universe. We factor out 



vOro = \/l — ^fc — to remove a troublesome inverse 
square root which is difficult to fit with a polynomial ex- 
pansion. We fit d(flk, £^a, w) with a 5*^ order polynomial 
expansion, done by calculating d numerically for several 
hundred thousand points in (fife , f^A , w)-space and fitting 
to this sample. Our main errors then come from the as- 
sumption of a constant recombination redshift; however 
this fit is good to the ~ 0.1% level, and performs notably 
better than the fit of [18] for non-flat (fife ^ 0) and dy- 
namical dark energy (w ^ —1) scenarios. This fit can 
be downloaded as part of our publicly available CMBfit 
software package. 

The comoving sound horizon at decoupling is defined 

as 



Jo 



c s (t)dt 
a(t) 



(13) 



where c s (t) is the sound speed for the baryon-photon fluid 
at time t, well approximated by 



= g(l + 3p 6 /4p 7 )- 1 . 



(14) 



Using the relation dt/a = da/(a 2 H) and the Friedman 
equation, we can write this similarly to eqn.(ll) 



2 This work pre-dates WMAP and other recent CMB surveys. The enormous improvement in the data will by itself have 
reduced the degeneracies to some degree, adding more information to the power spectrum. 



•5 



r s {a d ec) = 



h 2 = H 2 /2A2 n ° 



+ a-)] 1/2 

(15) 



= 0.0264c^°' 762 X e -°- 4 76(i™(25.5w b + 1.84w m )) 2 



(23) 



If we assume that vacuum energy can be ignored at last 
scattering, this can be readily integrated to give 



r s = 



2V3 



(i?*(l + z*)y 1/2 In 



1 + \fr~Jl* 



(16) 



where the photon-baryon and radiation-matter ratios at 
last scattering are given by [19,18] 



r* = 



4p 7 ( 2 *) 

_ Pr{z*) 



with a redshift of last scattering 
z* w 1048 (1 + 0.00124w 



= SO^^/IO 3 )" 1 
= 0.042^ m 1 (z»/10 3 ), 



0.738N 



(1 



.9i = 0.0783o; 6 



(1 + 39.5 Wb 0763 ) 



-1 



.92 = 0.560 (l + 21. lw. 



1.811 



(17) 
(18) 

(19) 
(20) 
(21) 



The sound horizon and the angular diameter distance 
at the time of decoupling combine to give the angle sub- 
tended by the sound horizon at last scattering. In de- 
grees, it is given by 



e. 



r s (a de 



180 



DA(adec) 



(22) 



This has been verified to be an excellent choice for a 
normal parameter [20,21] and is indeed one of the best 
constrained cosmological parameters [5,7]. We use this 
parameter, except replacing the exact integrals Z?A(«dec) 
and r s (adec) with their analytic approximations, equa- 
tions (12) and (16). 



2. The peak ratios hi, hz and the scalar tilt parameter t 

Hu et al. [18] (and the recent re-analysis of [4]) define 
parameter fits to the ratios of the 2nd and 3rd peaks to 
the first peak. Again these are near ideal normal param- 
eters since they are directly measurable from the power 
spectrum. However there is one problem. Our require- 
ment is a parameter set with which to replace the cos- 
mological parameters. H 3 is mostly dependent on Q m h 2 
and H 2 most heavily dependent on fi^/i 2 — however, they 
both also depend on the tilt. In other words, three cos- 
mological parameters come together to form two observ- 
ables. Thus to obtain a corresponding three-parameter 
set, we factor out the tilt-dependence from H 2 (Page et 
al. [4]) and H 3 (Hu et al. [18])and create the variable set 
{h 2l h 3l t}, where 



and 



Fa/ 3 - 6 ™ 3 " 1 = 2 - 17 (! + K/0.044) 2 ) 1 x 



,0.59 



(1 + 1.63(1 - Wb/0.071))" 



(24) 



The parameter t is given by a slight modification of the 
formula used by [21] in order to minimize correlation with 

UJ h 



t 



V 0.024/ 



(25) 



3. The amplitude at the pivot point 

For the amplitude we again use the choice of [21]. 
This choice removes the near perfect degeneracy with 
the opacity e~ 2r due to a non-zero optical depth. It also 
evaluates the amplitude at a more optimal pivot point 
rather than at the arbitrary k = 0.05Mpc _1 , such as to 
remove degeneracy with the tilt n s . However we make 
two modifications: We change the choice of pivot-point, 
as this is dependent on the data set and needs to be up- 
dated to optimize results for WMAP. We also remove 
a strong correlation with u m empirically. The resulting 
formula is 



A* = A s e 



-1r 



^pivot 



-0.568 



where h 



pivot 



(26) 

0.041Mpc _1 (this choice minimizes 



A A* I A* using WMAP temperature and polarization in- 
formation; the corresponding optimal value is k P i VO t = 
0.037Mpc _1 using WMAP temperature information 
alone). 



4- The non-vanilla parameters, ua, a and r 

For the 7 parameter case where the assumption of spa- 
tial flatness is relaxed, the choice of an extra normal pa- 
rameter is not obvious. Since we use S as one of our 
parameters, in principle any of {^a, fife, h} could be used. 
Since there is now an extra free component in the Fried- 
man equation, we choose instead to go with the physical 
dark energy density u>q = Q,qIi 2 . This has most of the 
desirable properties we seek, and consequently it gives 
very small errors in the polynomial fit. We could in fact 
equally well have chosen the physical curvature density 
LOk = Vlkh 2 as this gives very similar results. Both per- 
form significantly better than any of the above three. 
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The running of the tilt is defined as a = dn/d\nk. 
When taken as a free 7th parameter, it has a distribu- 
tion function which is very nearly Gaussian (seen in fig 
6). Thus we use this parameter directly as a normal pa- 
rameter. 

For the tensor contribution, we use as our normal pa- 
rameter the tensor to scalar ratio, r = A t /A s , where A t 
and A s are defined earlier as the CMBfast tensor and 
scalar fluctuation amplitudes respectively, evaluated at 
k = 0.05Mpc _1 . 



III. RESULTS 

In this section, we present the results of the fits. We 
estimate the errors in the fitting and in particular display 
the marginalized likelihoods obtained by running Markov 
Chain Monte Carlo chains using the our fits in place of 
CMBfast. 



A. Fitting accuracy 

In principle a data set may be fitted to any accuracy 
by using a polynomial of sufficiently high order. How- 
ever, this will introduce unphysical polynomial artifacts, 
which will ruin the method's applicability. As explained 
in section II, we therefore split our data into a training 
set and a test set. This allows us to identify the optimal 
order of the polynomial that we fit to. 

This approach is illustrated for the 6 parameter ACDM 
case in figure 1, where we plot the fitting accuracy for 
2nd through 7th order polynomials, showing the differ- 
ence between test and training sets. The training set er- 
rors predictably fall with increasing polynomial degree, 
as there are more degrees of freedom with which to fit 
the data. The test set errors, however, have a clear min- 
imum (in this case for n = 6) which is what we choose as 
optimal polynomial fit. This optimal polynomial degree 
depends strongly on how long a chain is used for the fit. 
A large sample allows us to go to higher order, whereas 
for a small sample, even 3rd or 4th order polynomials 
may be ill fated. 

Due to theoretical prejudice for the probability dis- 
tributions to be close to Gaussian as well as relatively 
smooth, it is possible that some of the errors in fitting to 
the CMBfast-calculated likelihoods could be due to in- 
accuracies in CMBfast itself rather than our polynomial 
approximation, although we have not attempted to test 
or quantify this, this may be worth exploring in future 
work. 

Figure 1 also shows that we obtained slightly larger 
random scatter when computing power spectra with 
DASh [27] instead of CMBfast. This does not to appear 
to be a limitation of DASh itself, since the accuracy is 
greatly improved when using the latest version of DASh 



with transfer functions from the latest version of CMB- 
fast (Knox 2003, private communication). From here on, 
we use CMBfast to calculate all our likelihood samples. 

The mean scatter in the values of ln£ range from a 
good 0.05 for the 6th order fit to the ACDM 6 parame- 
ter model to a more dubious 0.69 for the 6 parameter + 
tensor perturbation case. Accuracies for all the cases are 
shown in table II. However it is not immediately clear 
that the error in hiC is necessarily the most interest- 
ing quantity, as it may include contributions at low ln£ 
which will have negligible impact on the relevant parts 
of the likelihood surface. 

An equally interesting quantity is AC — Cfu — C 
(we normalize C to equal unity at its maximum), which 
shrinks rapidly with decreasing values for ln£ and thus 
better illustrates what kind of accuracies we can expect 
for the marginalized distributions. The mean scatter in 
the values of C are significantly smaller, typically of or- 
der ~ 0.01 for the entire dataset. A better understanding 
of these quantities can be obtained by studying figure 2. 
It shows first how well (or rather how badly) the ac- 
tual likelihood surface (both in terms of C and ln£) is 
described by a Gaussian PDF in the transformed nor- 
mal parameters. These are the variables z defined in 
section II C, which have zero mean and identity covari- 
ance matrix, and the d-dimcnsional radius is given by 

r = |z| = {z\ H V zj) 1 / 2 . The plot then shows the 

fitting errors AC = Cfu — C and Aln£ = hiCfu — ln£, 
plotted as errors relative to a Gaussian distribution. This 
visualizes the ability of the method to reproduce the like- 
lihood surface with good precision and the dramatic im- 
provement gained from going to polynomial degree higher 
than two. 



B. Application to MCMC 

The ultimate end-to-end test of the method is how well 
the 1-dimensional and 2-dimensional marginal distribu- 
tions are reproduced. To test this, we use the polyno- 
mial fits to calculate the likelihoods for our MCMC algo- 
rithm. Due to the inevitability of polynomial artifacts at 
low confidence areas, we use a cut-off at the 3 sigma level 
(corresponding to a maximum value for r defined above) , 
where we replace the polynomial fit by a simple Gaussian. 
This allows for the algorithm to find its way through the 
burn-in process to the allowed part of parameter space 
from any starting point. The results are shown in figures 
3-7. The figures show what is already indicated in table 
II: The marginalized distributions are reproduced to well 
within the sampling errors, the only exceptions being a 
couple of parameters in the tensor case, demanding only 
negligible computational time (a chain of length 200000 
runs in about a minute). 

Figure (8) illustrates one of the most useful applica- 
tions of our approach: combining WMAP with another 
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Training set w/ CMBFAST 
Test set w/ CMBFAST 

Training set w/ DASh / 
Test set w/ DASh 




4 5 
Polynomial order, n 



FIG. 1. R.m.s. fitting error A\n£ for the 6 parameter ACDM case. The plot compares the training and test sets for fits 
based on chains using CMBfast and chains using DASh. For the particular chain-length used for the CMBfast case, we see 
that a 6th degree fit is optimal. 
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TABLE II. R.m.s. polynomial fit errors for all the considered cases, ranging from 6 parameter vanilla ACDM models to 
spiced up models including curvature, tensor perturbations, running tilt. 



data set, here the SDSS galaxy power spectrum as re- 
ported in [7,8] to give constraints on the cosmological 
parameters for a non-flat 7 parameter model. The red 
dashed curve shows the results from running a Markov 
Chain Monte Carlo for about a week obtaining only 14000 
points, and the solid black curve is the reproduction by 
our polynomial fit, taking a mere afternoon to run. In- 
deed, the time-consuming part in this calculation was 
the computation of the non-linear matter power spec- 
trum, the processor time needed to calculate the WMAP 
likelihoods being under a minute. 



IV. CONCLUSIONS AND DISCUSSION 

The big bottleneck in parameter estimation with CMB 
data has hitherto been the computation of theoreti- 
cal power spectra by means of CMBfast, CAMB, CM- 
BEASY, DASh or similar, for a reasonably fast computer 



typically requiring from 5 seconds to several minutes for 
non-flat models with massive neutrino for CMBfast, and 
1 second for DASh. In contrast, our method requires 
less than a millisecond per model, since we have already 
precomputed the WMAP likelihoods using CMBfast and 
distilled the results into a convenient fitting function. Af- 
ter transforming into a physically motivated parameter 
set, the only calculation needed is the evaluation of an 
n th order polynomial with as few as 210 coefficients for 
the 6 parameter quartic case. As the number of coeffi- 
cients may actually exceed the number of measured C;s 
the method is clearly not a way of compressing the data. 
Rather we are compressing the amount of calculation nec- 
essary to convert cosmological parameters into likelihood 
values. 

The typical error in this approach is AC ~ 0.01 (with 
the peak likelihood normalized to unity) and A In £ ~ 
0.05 — 0.5. To place these inaccuracies in context, let us 
first discuss how they compare with the inaccuracies in 
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FIG. 2. The accuracy of our method for the case of a 5th order polynomial fit to the 7-parameter case including a is shown 
for the likelihood C (top) and its logarithm ln£ (bottom). The left panels show how well the likelihood is fit by a pure 
Gaussian, as a function of the radius r in the transformed parameter space. The right panels show the corresponding errors 
AC = Cfu — C (top) and Aln£ = In Cfu — ln£ for our polynomial fit, plotted relative to the Gaussian, i.e., the top right 
panel shows C ga uss + Cfu — C. 



other methods, then the question of how good is good 
enough. 

The latest version of CMBfast [23] has an accuracy 
down to ~ 0.1% (r.m.s). It is unclear if other recent 
packages match this number, but they are generally good 
to the 1 — 2% level. Despite computer advances, the 
CMBfast code is still slow when several hundred thou- 
sand model computations are required as for grid calcula- 
tions or Markov Chain Monte Carlo applications. Several 
shortcuts have therefore been developed in order to com- 
pute the Likelihood faster than the full CMBfast. The 
"fc-split" method in CMBfast due to Tegmark et al. [26] 
utilizes the fact that the high-f and \ow-£ ends of the 
power spectrum depends on different parameters, and so 
calculates the two parts separately. The end result is 
then combined into a final power spectrum. The Davis 
Anisotropy Shortcut (DASh) [27] takes this method fur- 
ther, and creates the power spectrum by interpolation 
between points in a huge pre-computed grid of transfer 
functions. Comparing likelihoods calculated with fc-split 
CMBfast and DASh with those calculated with the max- 
imally accurate CMBfast, we found that the errors from 
our fitting method in both A In £ and AC are signifi- 
cantly smaller than the errors in these quantities from 
the DASh or fc-split approximations. For instance, DASh 
gave r.m.s. A In C w 0.9 for the 6-parameter case using 
only unpolarized WMAP information and the ksplit in- 
accuracies are similar, which should be compared with 
our value Aln£ m 0.08 from Table II. 



How accurate is accurate enough? For most applica- 
tions, the key issue is not inaccuracies in the power spec- 
tra or likelihoods, but inaccuracies in measured cosmo- 
logical parameters. If an approximation shifts the mea- 
sured values of all parameters by much less than their 
statistical uncertainties, it is for all practical purposes 
irrelevant. 

We have also shown that the errors in the marginalized 
distributions are minimal, and that our method should 
be used by anyone who do not have excessive amounts 
of time and computer power on their hands. Our last 
six figures show that our approximation is clearly accu- 
rate enough in this sense except for the above-mentioned 
glitches in the tensor case. There is, however, one sub- 
tle and somewhat counterintuitive point that is worth 
clarifying. Since WMAP constrains the power spectrum 
normalization to the 10 -3 level when all other param- 
eters are held fixed, this means that even a seemingly 
tiny ~ 0.1% inaccuracy in the power spectrum can in 
principle cause a change of order unity in \ 2 an d A ln£, 
i.e., an inaccuracy larger than that of our fitting method, 
and one may naively expect that inaccuracies of order 
A In £ ~ 1 would affect the parameter measurements at 
the lcr level. Although this would be true if only one 
parameter were being measured, the situation at hand is 
actually much better because of degeneracies. As long 
as the inaccuracies do not exactly mimic the effect of 
changing some cosmological parameter, they will alter 
ln£ mainly via the narrowest directions in the multidi- 
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FIG. 6. WMAP ACDM + running scalar tilt, marginalized likelihoods for the polynomial fit compared to the original chain. 
Apart from minor differences in the r distribution, the two distributions are identical. 
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FIG. 8. SDSS + WMAP ACDM + curvature contributions, marginalized likelihoods for the polynomial fit compared to a 
chain consisting of only ~ 14000 points. The fit is excellent, the errors coming from poisson noise due to the short length of 
the CMBfast calculated chain. 
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mensional degeneracy banana and hence have little effect 
on the final results. For instance, if a fitting inaccuracy 
causes a relative error in Of at the 10 -3 level, it will have 
no noticeable effect on the estimates any of the vanilla 
cosmological parameters, whose error bars are dominated 
by the eigenparameters with the largest rather than the 
smallest uncertainties. The bottom line is therefore that 
although the the DASh and fc-split ln£ inaccuracies of 
order unity may sound worrisome, these two approxi- 
mations are nonetheless sufficient to give negligible in- 
accuracies in cosmological parameter measurements, and 
the still better accuracy of our fitting method is actually 
overkill. 

Our method has several applications. First, it is ex- 
tremely useful when combining the WMAP-data with 
non-CMB data such as the galaxy surveys, weak lens- 
ing data, supernovae data, etc., where likelihood calcu- 
lations are fast relative to CMB power spectrum calcu- 
lations. Using our method to measure parameters from 
the combined WMAP and SDSS data [8] enabled us to do 
some of the analysis with dramatically increased speed 3 . 
Second, even when running Markov chains for models 
not considered here, one can get good results by simply 
running the chain for long enough to acquire a sufficient 
(but not necessarily statistically fair) sample of the sur- 
face, then compute the polynomial fit, and go on to use 
this fit for the remainder of the job until all relevant 
mixing and convergence tests are fulfilled. A further ob- 
vious application is generating extremely long Markov 
chains, eliminating sampling errors almost completely 
Finally, should the reader be intent on requiring the best 
accuracy that CMBfast can offer, our approach can still 
be helpful: It is a fundamental fact about MCMC algo- 
rithms that they do not produce completely uncorrelated 
points, and the correlation length of the chains can easily 
be several hundred points, and only gets as low as 45 even 
for our simply 6-parameter WMAP chains. Thus the cal- 
culations may be significantly accelerated by creating a 
statistically random sample by means of the polynomial 
fit, thinning the chain to every 200 points or so, and cal- 
culating the CMBfast power spectrum and WMAP likeli- 
hoods for these points. A final likelihood sample with the 
strictly correct distribution may then be obtained using 
importance sampling as described in [32]. 

We supply Fortran routines for computing the 
likelihoods for all the cases given in the text 
at http://www.hep.upenn.edu/~sandvik/CMBfit.html , 
and we plan to complement this work with further mod- 
els and fits in the future. 
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